The Fundamentals

Published

July 2, 2025

In this workshop, we will learn about:

Maths and objects

The console (by default at the bottom left in RStudio) is where most of the action happens. In the console, we can use R interactively. We write a command and then execute it by pressing Enter.

In its most basic use, R can be a calculator. Try executing the following commands:

10 - 2
[1] 8
3 * 4
[1] 12
2 + 10 / 5
[1] 4

Those symbols are called “binary operators”: we can use them to multiply, divide, add and subtract. Once we execute the command (the “input”), we can see the result in the console (the “output”).

What if we want to keep reusing the same value? We can store data by creating objects, and assigning values to them with the assignment operator <-:

num1 <- 42
num2 <- num1 / 9
num2
[1] 4.666667

You can use the shortcut Alt+- to type the assignement operator quicker.

We can also store text data:

sentence <- "Hello World!"
sentence
[1] "Hello World!"

You should now see your objects listed in you environment pane (top right).

As you can see, you can store different kinds of data as objects. If you want to store text data (a “string of characters”), you have to use quotes around them.

You can recall your recent commands with the up arrow, which is especially useful to correct typos or slightly modify a long command.

Using the console is great to test things and quickly run commands and get outputs. However, if we want to store our process and refine our code as we go over several sessions, it is best to work with a script. Let’s do a bit more setting up of our project.

Create a folder structure

To keep it tidy, we are creating 3 folders in our project directory:

  • scripts
  • data
  • plots

You can do that with the “New Folder” button in the “Files” pane (bottom right of the window).

Scripts

Scripts are simple text files that contain R code. They are useful for:

  • saving a set of commands for later use (and executing it in one click)
  • making research reproducible
  • making writing and reading code more comfortable
  • documenting the code with comments, and
  • sharing your work with peers

Let’s create a new R script with the menu: File > New File > R Script. (This can also be done with the first icon in the toolbar, or with the shortcut Ctrl+Shift+N.)

This opens our fourth pane in the top left of RStudio: the source pane.

Comments

We should start with a couple of comments, to document our script. Comments start with #, and will be ignored by R:

# Description: Introduction to R and RStudio
# Author: <your name>
# Date: <today's date>

Syntax highlighting

Now, add some commands to your script:

num1 <- 42
num2 <- num1 / 9

Notice the colours? This is called syntax highlighting. This is one of the many ways RStudio makes it more comfortable to work with R. The code is more readable when working in a script.

While editing your script, you can run the current command (or the selected block of code) by using Ctrl+Enter. Remember to save your script regularly with the shortcut Ctrl+S. You can find more shortcuts with Alt+Shift+K, or the menu “Tools > Keyboard Shortcuts Help”.

Functions

An R function is a little program that does a particular job. It usually looks like this:

<functionname>(<argument(s)>)

Arguments tell the function what to do. Some functions don’t need arguments, others need one or several, but they always need the parentheses after their name.

For example, try running the following command:

round(num2)
[1] 5

The round() function rounds a number to the closest integer. The only argument we give it is num2, the number we want to round.

If you scroll back to the top of your console, you will now be able to spot functions in the text.

Help

What if we want to learn more about a function?

There are two main ways to find help about a specific function in RStudio:

  1. the shortcut command: ?functionname
  2. the keyboard shortcut: press F1 with your cursor in a function name (you can do this by simply clicking on the function name)

Let’s look through the documentation for the round() function:

?round

As you can see, different functions might share the same documentation page.

There is quite a lot of information in a function’s documentation, but the most important bits are:

  • Description: general description of the function(s)
  • Usage: overview of what syntax can be used
  • Arguments: description of what each argument is
  • Examples: some examples that demonstrate what is possible

See how the round() function has a second argument available? Try this now:

round(num2, digits = 2)
[1] 4.67

We can change the default behaviour of the function by telling it how many digits we want after the decimal point, using the argument digits. And if we use the arguments in order, we don’t need to name them:

round(num2, 2)
[1] 4.67

To group values together in a single object, use the c() function.

c() combines the arguments into a vector. In other words, it takes any number of arguments (hence the ...), and stores all those values together, as one single object. For example, let’s store the ages of our pet dogs in a new object:

ages <- c(4, 10, 2, NA, 3)

You can store missing data as NA.

We can now reuse this vector, and calculate their human age:

ages * 7
[1] 28 70 14 NA 21

R can create visualisations with functions too. Try a bar plot of your dogs’ ages with the barplot() function:

barplot(ages)

We can customise the plot with a title and some colours, for example:

barplot(ages, main = "How old are my dogs?", col = "pink")

Challenge 1 – Finding help

Use the help pages to find out what these functions do, and try executing commands with them:

  1. rep.int()
  2. mean()
  3. rm()

rep.int() creates vectors like c(), but it is designed to easily replicate values. For example, if you find something very funny:

rep.int("Ha!", 30)
 [1] "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!"
[13] "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!"
[25] "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!"

The next function, mean(), returns the mean of a vector of numbers:

mean(ages)
[1] NA

What happened there?

We have an NA value in the vector, which means the function can’t tell what the mean is. If we want to change this default behaviour, we can use an extra argument: na.rm, which stands for “remove NAs”.

mean(ages, na.rm = TRUE)
[1] 4.75

In our last command, if we hadn’t named the na.rm argument, R would have understood TRUE to be the value for the trim argument!

rm() removes an object from your environment (remove() and rm() point to the same function). For example:

rm(num1)

R does not check if you are sure you want to remove something! As a programming language, it does what you ask it to do, which means you might have to be more careful. But you’ll see later on that, when working with scripts, this is less of a problem.

Let’s do some more complex operations by combining two functions:

ls() returns a character vector: it contains the names of all the objects in the current environment (i.e. the objects we created in this R session). Notice that this function doesn’t require us to provide any argument, but we still need to write the parentheses to run the function.

Is there a way we could combine ls() with rm()?

You can remove all the objects in the environment by using ls() as the value for the list argument:

rm(list = ls())

We are nesting a function inside another one. More precisely, we are using the output of the ls() function as the value passed on to the list argument in the rm() function.

Incomplete functions

If you don’t finish a function, by leaving off the last bracket ) for example, the line of code won’t necessarily give you an error, but it won’t work very well. If you forget to include that last bracket, R will run the code, and then wait for further instructions before giving you an output. This will appear as a + in the console like so:

> round(1.23
+

If you try to give any further instructions to R, it will likely just continue giving you + symbols, and not return anything. To stop this, click on the console and press the Esc key on your keyboard.

More help

We’ve practised how to find help about functions we know the name of. What if we don’t know what the function is called? Or if we want general help about R?

  • The function help.start() is a good starting point: it opens a browser of official R help.
  • If you want to search for a word in all the documentation, you can use the ?? syntax. For example, try executing ??anova.
  • Finally, you will often go to your web browser and search for a particular question, or a specific error message: most times, there already is an answer somewhere on the Internet. The challenge is to ask the right question!

Import data

Let’s bring in some data. Download our gapminder dataset and save it in the data/ folder you just created.

Once you’ve got the data, use the read.csv() command to bring it into R:

gapminder <- read.csv("data/gapminder.csv")

What do you think they do? Describe each one in detail, and try executing them.

Explore data

We have downloaded a CSV file from the Internet, and read it into an object called gapminder.

You can type the name of your new object to print it to screen:

gapminder

That’s a lot of lines printed to your console. To have a look at the first few lines only, we can use the head() function:

head(gapminder)
      country year      pop continent lifeExp gdpPercap
1 Afghanistan 1952  8425333      Asia  28.801  779.4453
2 Afghanistan 1957  9240934      Asia  30.332  820.8530
3 Afghanistan 1962 10267083      Asia  31.997  853.1007
4 Afghanistan 1967 11537966      Asia  34.020  836.1971
5 Afghanistan 1972 13079460      Asia  36.088  739.9811
6 Afghanistan 1977 14880372      Asia  38.438  786.1134

Now let’s use a few functions to learn more about our dataset:

class(gapminder) # what kind of object is it stored as?
[1] "data.frame"
nrow(gapminder) # how many rows?
[1] 1704
ncol(gapminder) # how many columns?
[1] 6
dim(gapminder) # rows and columns
[1] 1704    6
names(gapminder) # variable names
[1] "country"   "year"      "pop"       "continent" "lifeExp"   "gdpPercap"

All the information we just saw (and more) is available with one single function:

str(gapminder) # general structure
'data.frame':   1704 obs. of  6 variables:
 $ country  : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
 $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
 $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
 $ continent: chr  "Asia" "Asia" "Asia" "Asia" ...
 $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
 $ gdpPercap: num  779 821 853 836 740 ...

The RStudio’s environment panel already shows us some of that information (click on the blue arrow next to the object name).

And to explore the data in a viewer, click on the table icon next to the object in the Environment pane.

This viewer allows you to explore your data by scrolling through, searching terms, filtering rows and sorting the data. Remember that it is only a viewer: it will never modify your original object.

Notice that RStudio actually runs the View() function. Feel free to use that instead of clicking on the button, but note that the case matters: using a lowercase “v” will yield an error.

To see summary statistics for each of our variables, you can use the summary() function:

summary(gapminder)
   country               year           pop             continent        
 Length:1704        Min.   :1952   Min.   :6.001e+04   Length:1704       
 Class :character   1st Qu.:1966   1st Qu.:2.794e+06   Class :character  
 Mode  :character   Median :1980   Median :7.024e+06   Mode  :character  
                    Mean   :1980   Mean   :2.960e+07                     
                    3rd Qu.:1993   3rd Qu.:1.959e+07                     
                    Max.   :2007   Max.   :1.319e+09                     
    lifeExp        gdpPercap       
 Min.   :23.60   Min.   :   241.2  
 1st Qu.:48.20   1st Qu.:  1202.1  
 Median :60.71   Median :  3531.8  
 Mean   :59.47   Mean   :  7215.3  
 3rd Qu.:70.85   3rd Qu.:  9325.5  
 Max.   :82.60   Max.   :113523.1  

Notice how categorical and numerical variables are handled differently?

Let’s now plot the relationship between GDP per capita and life expectancy:

plot(gapminder$gdpPercap, gapminder$lifeExp,
     xlab = "GDP per capita (USD)",
     ylab = "Life expectancy (years)")

For more on visualisations, we will later dive into the popular ggplot2 package.

Finally, let’s fit a linear model to see how strongly correlated the two variables are:

linear_model <- lm(gapminder$lifeExp ~ gapminder$gdpPercap)
summary(linear_model)

Call:
lm(formula = gapminder$lifeExp ~ gapminder$gdpPercap)

Residuals:
    Min      1Q  Median      3Q     Max 
-82.754  -7.758   2.176   8.225  18.426 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)         5.396e+01  3.150e-01  171.29   <2e-16 ***
gapminder$gdpPercap 7.649e-04  2.579e-05   29.66   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.49 on 1702 degrees of freedom
Multiple R-squared:  0.3407,    Adjusted R-squared:  0.3403 
F-statistic: 879.6 on 1 and 1702 DF,  p-value: < 2.2e-16

The P-value suggests that there is a strong relationship between the two.

Packages

Packages add functionalities to R and RStudio. There are more than 21000 available.

You can see the list of installed packages in your “Packages” tab, or by using the library() function without any argument.

We are going to install a package called “skimr”. We can do that in the Packages tab:

  1. Open the “Packages” tab (bottom-right pane)
  2. Click the “Install” button
  3. Search for “skimr”
  4. Click “Install”

Notice how it runs an install.packages() command in the console? You can use that too.

If I now try running the command skim(), I get an error. That’s because, even though the package is installed, I need to load it every time I start a new R session. The library() function does that. Let’s load the package, and use the skim() function to get an augmented summary of our gapminder dataset:

library(skimr) # load the package into your library
skim(gapminder) # use a function from the package
Data summary
Name gapminder
Number of rows 1704
Number of columns 6
_______________________
Column type frequency:
character 2
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
country 0 1 4 24 0 142 0
continent 0 1 4 8 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 1979.50 17.27 1952.00 1965.75 1979.50 1993.25 2007.0 ▇▅▅▅▇
pop 0 1 29601212.33 106157896.75 60011.00 2793664.00 7023595.50 19585221.75 1318683096.0 ▇▁▁▁▁
lifeExp 0 1 59.47 12.92 23.60 48.20 60.71 70.85 82.6 ▁▆▇▇▇
gdpPercap 0 1 7215.33 9857.45 241.17 1202.06 3531.85 9325.46 113523.1 ▇▁▁▁▁

This function provides further summary statistics, and even displays a small histogram for each numeric variable.

Packages are essential to use R to its full potential, by making the most out of what other users have created and shared with the community. To get an idea of some of the most important packages depending on your field of study, you can start with the CRAN Task Tiews.

Challenge 3

For a bit of fun:

  • Try installing the package “cowsay” and using its function say().
  • Have a look at the documentation and the package’s website.
  • Can you make Clippy say the current time?
  • Can you make a chicken say “bok” a thousand times? (Hint: look at the paste() function and its arguments.)

Closing RStudio

You can close RStudio after making sure that you saved your script.

When you create a project in RStudio, you create an .Rproj file that gathers information about the state of your project. When you close RStudio, you have the option to save your workspace (i.e. the objects in your environment) as an .Rdata file. The .Rdata file is used to reload your workspace when you open your project again. Projects also bring back whatever source file (e.g. script) you had open, and your command history. You will find your command history in the “History” tab (upper right panel): all the commands that we used should be in there.

If you have a script that contains all your work, it is a good idea not to save your workspace: it makes it less likely to run into errors because of accumulating objects. The script will allow you to get back to where you left it, by executing all the clearly laid-out steps.

The console, on the other hand, only shows a brand new R session when you reopen RStudio. Sessions are not persistent, and a clean one is started when you open your project again, which is why you have to load any extra package your work requires again with the library() function.

Resources